perm filename BAZZ[E,ALS] blob sn#141917 filedate 1975-01-24 generic text, type T, neo UTF8
I.  Introduction 

 A.  Motivation is optimization - most of time is spent in function calls and 
	setting them up, calling sequences, linkages, arbitrary order of evaluating 
	arguments.  
 B.  Diatribe on correctness 
	1.  distinction between fact and intent 
	2.  gcd example 
 C.  Definition of the concept of an algorithm 
	1.  tie in notion of equivalence 
	2.  equivalence is unsolvable in general case 
	3.  equivalence is subject initially to constraints imposed by 
		a.  flow analysis - worst that could happen approach 
		b.  understanding of assembly language - if every operation in 
			the LAP program is understood, then equivalence can be 
			determined 
		c.  definition of effects of functions - i.e. duplicate occurrences 
			(e.g. no decoupling of result and effect in PUTPROP while 
			yes for SET, SETQ, RPLACA, and RPLACD) 
 D.  An analysis of the properties of a programming language 
	1.  semantics of primitive operations 
	2.  semantics of primitive data types 
 E.  Why use LISP? 
	1.  a useful mechanism for describing an algorithm 
	2.  case analysis 
	3.  close relationship to a machine independent form (canonical form) 
 F.  Examples of equivalences that can be detected by the system 
	1.  See Chapter 1 of Winograd and or Sussman for an example of how 
		to gently give a flavor of the power of a system.  By power it 
		is meant an illustration of the type of equivalences and 
		optimizations that can be recognized.  
	2.  Use various encodings of NEXT from compiler generated to Savitsky 
		algorithm with several intermediate steps such as my optimized 
		version and JRL's version.  Encodings should be annotated with 
		a register transfer like description so that reader will not be 
		baffled by LAP.  
	3.  Cap off by showing that system can not recognize a reformulation of 
		the algorithm on a higher level.  This is difference between 
		Algorithms 1 and 2.  
	4.  Demonstrate the ability of the system to pinpoint errors and to 
		indicate the approximate location of the error causing operation.  
		Note the fact that currently correction is not attainable (point 
		out that this is a direction for future work - maybe too early?).  
	5.  At present HIER1 is best example of error detection and correction.  
		Need to find a simpler example with a greater familiarity - i.e. 
		the meaning or purpose of the function should be clear.  
 G.  Type of equivalence handled 
	1.  refer to concept of algorithm 
	2.  loop unrolling without redundancy - use an example with REVERSE or 
		even better NEXT.  This will show that in a sense we define 
		a new function where the initial conditions are tested at the 
		end.  Draw the analogy to a DO UNTIL loop in ALGOL.  
	3.  equivalence between various assembly language encodings of a specific 
		algorithm by reducing them to a common form - i.e. the canonical 
		form.  The equivalence is shown by transitivity.  
 H.  Basis of canonical form is paper by McCarthy - 
	"Towards a Mathematical Theory of Computation" 
 I.  Comparison with Decompilation 
	1.  distinction is no pattern matching 
	2.  system is driven by an understanding system for assembly language 
	3.  no use of or search for a syntax in the assembly language code for 
		the function being worked on (a la Hollander).  
	4.  no use of pattern matching 
	5.  decompiling systems (i.e. Housel) have generally verified the 
		correctness of the decompilation process by a test of the 
		decompiled function by trying it out with data.  Our system 
		yields equivalence under all cases since the intermediate form 
		is derived by a symbolic execution of the assembly language 
		program under all possible conditions.  
	6.  We do not perform decompilation because compilation is a 1-many 
		operation and likewise for decompilation.  Whereas decompilation 
		systems use the original source program as the target or measure 
		of their success, we use an intermediate form which merely reflects 
		all of the computations performed.  
	7.  That no decompilation is going on, can be quickly seen by the two 
		encodings of HIER1 or even NEXT.  
	8.  ACM 74 notes from session on decompiling (Nov. 13,1974) 
 J.  Subset of LISP handled 
	1.  EXPRs and FEXPRs 
	2.  No PROGS 
		a.  if no GO, then a PROG is equivalent to an EXPR 
		b.  GO can be incorporated by use of pseudo functions - i.e. 
			break down the function into subparts where each part 
			is between two labels.  
	3.  RPLACA and RPLACD operations 
	4.  Needless to say only compiled LISP is handled.  This precludes the 
		assoc list mechanism of determining variable values.  In fact 
		variable names only exist for global variables which have been 
		declared to be SPECIAL initially.  Local parameters - i.e. the 
		formal parameters, have names which do not exist anywhere outside 
		of the function to which they are local and cannot be accessed 
		by name in functions called by the function to which they are 
		local.  
	5.  SPECIAL variables declared on the outer level (see 4).  
	6.  SETQ to local and SPECIAL variables.  
	7.  Internal LAMBDA - i.e. common subexpressions 
 K.  Possible use of the system in the future 
	1.  Back end of an optimizing compiler - i.e. verify correctness of code 
		generated.  
	2.  A debugging system for compilers 
	3.  Computer aided instruction for assembly language programming.  
 L.  Distinction between proving a compiler correct and our proofs for 
	individual programs.  The general problem is quite difficult, and the 
	current solution is quite satisfactory at a reasonble cost.  
 M.  Importance of machine description and independence of the proof procedure 
	and the host machine for the assembly language.  Conceivably, a different 
	computer's instruction set could be plugged into the system with relative 
	ease.  
 N.  Relationship to previous correctness apporaches 
	1.  Assertions - difficulty is in mechanizing the creation of assertions.  
		Moreover, this concept is basically tied to the "intent" school 
		of correctness (Feb. 1,1974) 
	2.  Predicate Calculus diatribe (Feb. 23,1974) 
	3.  Data base of the rederivation process (this is name given to the 
		procedure of going from assembly language to an intermediate form) 
		is basically a model of all of the computations performed up 
		to the current instruction.  At each encounter of an instruction, 
		the current data base serves as a precondition, the instruction 
		as an operator, and the new data base as the postcondition.  This 
		leads us to view the data base as an assertion about the 
		computation (vague and maybe unnecessary?). 
	4.  Do not use input output pairs to show equivalence (this is basically 
		how Housel showed that his decompilation is correct).  We feel that 
		this is too limiting since a proof is only good for its data.  
 O.  The rederivation procedure works for compiler generated code as well as 
	handcode subject to constraints such as:  
	1.  Results of a function are returned in a specific location 
	2.  Arguments to a function are found in specified locations 
	3.  The function is limited as to the parts of memory it can access 
		a.  program 
		b.  registers 
		c.  stack 
	4.  Note that the restrictions basically pertain to calling sequences and 
		protection.  
 P.  System can also be viewed as an understanding system for assembly language 
 Q.  Full analysis of implications of EQ, EQUAL, and ATOM 
 R.  Equivalence is proved by use of a normal form rather than a canonical form 
 S.  Correctness of optimization is a valid motivation for this project since a 
	measurement system showed that optimizations such as those the system 
	is capable of recognizing lead to significant time and space improvement 
	(see HIER1).  In fact, one of the results of this study is that the 
	best calling sequence is not a stack or registers, but a combinations of 
	the two.  This is where confusion starts and the need for a system such 
	as the one proposed becomes more acute.  
 T.  Relation to structured programming (see Knuth notes Sept. 24,1974) - i.e. LISP 
	is a structured programming language and we show equivalence through use of 
	suitable transformations.  
 U.  Flow analysis poses limitations on the knowledge of the effects of functions.  
 V.  Literature Survey 
	1.  McCarthy 
	2.  Floyd 
	3.  King 
	4.  Allen  
	5.  Hollander 
	6.  Housel 
	7.  Schneider 
	8.  Optimization via syntactic rather than semantic methods (Oct. 26,1974) 
 W.  One of the main reasons for the success of the system is the proper delegation 
	of tasks between the proof procedure and the rededrivation procedure.  
	Some examples are:  
	1.  β-reduction  this is a fancy name for the process of substituting the 
		evaluated bindings of internal LAMBDA variables for the occurrences 
		of the variables in the original program.  The numerical 
		representation of the function enables the adherence to call by 
		value despite this substitution.  
	2.  α-conversion  this is a fancy name for renaming of variables.  In 
		our case this occurs whenever we have a SETQ to a local variable, 
		we replace the local variable by its bound value.  This is valid 
		since the name has no meaning.  Actually, the distinction between 
		our use of β-reduction and α-conversion is hazy and should 
		probably be scrapped.  
	3.  ∂-reduction is exhibited by the avoidance of computing expressions and 
		conditions whose values are already known.  The most common 
		example of this situation is the replacement of (T→p,q) by p 
		and likewise for (F→p,q) by q.  This technique is applied in both 
		the rederivation procedure and the proof procedure.  
	4.  The importance of 1. and  2. lies in the absence of names in the 
		asssembly language program.  Thus we are able to reconcile the 
		absence of names in the rederived form with their use in the 
		intermediate form.  
	5.  Distinction between value and effect of a computation (SETQ, RPLACA, 
		RPLACD)  
	6.  Jumps not to the start of the program and loop unrolling 
		a.  proof that start can be bypassed is done in rederivation 
		b.  proof that loop unrolling is OK even though condition 
			rearranging has taken place is done in proof procedure.  
			Specifically, note the use of manipulating the canonical 
			form and rederived forms at the appropriate times to 
			yield most "effective" proof method.  
	7.  Others? 
 X.  A function has two representations:  
	1.  symbolic 
	2.  numeric 

 Y.  We deal with the problem of equivalence of two encodings of an algorithm.  
	We do not focus 
	any attention to the correctness of the algorithm.  However, in the 
	process of proving the equivalence of two algorithms, we have in fact 
	proved the correctness of the translation process from one encoding of 
	the algorithm to the other.  The translation process is shown to be 
	correct for one specific case.  Note that we can not infer the e 
	correctness of the translation process in general from this result.  
	However, this type of a procedure is useful in proving that a compiler 
	is correct since it need only be shown correct for the inputs supplied 
	to it.  This point is often overlooked by other workers in the field.  
	In a sense it could be said that we have bootstrapped ourselves to 
	the state where we can attribute a state of "effective" correctness 
	to a compiler by merely demonstrating that its output is equivalent 
	to its input - a rather powerful step!  

 Z.  In the future PROGs can be implemented along two avenues 
	1.  Break up function into subunits of code between labels and CONDs 
		(pseudofunctions).  Compile the pseudofunctions individually.  
	2.  Use pseudofunctions but rederivation procedure when encounter a 
		label seen previously to which a redundant path can not be found 
		makes sure that all locations referenced in the future have 
		valid contents (more later).  Problem is with associating names 
		with the locations (see October-November 1973 notes).  
AA.  Peter Deutsch - December 14,1972:  "If a program can't be verified, then either 
	it is wrong or one has inadequately specified the assertions."  Feels that 
	this is all that we can ask from the system.  Deutsch proves correctness 
	by means of assertions.  
AB.  Conclusion:  one of main outgrowths of work is impetus it provided for 
	scrutinizing LISP.  End result is the definition of CMPLISP and 
	implementation - protection.  Concept of Free variable is vastly overblown - 
	global is sufficient as evidenced by the system {similarly for GO to 
	backward locations}.  
AC.  System provides a framework for proving optimizations correct.  An interesting 
	future work is when some optimization cannot be handled, explain why.  
AD.  Proposal for Research:  Implement CMPLISP, an optimizer, inter function proofs, 
	and extension to Interlisp?  

Rough Outline 

I.  Introduction - demostrate power and ideas - go through various optimization 
	examples (can use SYNPUB examples as well as various NEXT encodings) and 
	show types of optimizations we can prove equivalent.  
II.  Perspective - Literature Survey, and discusions on correctness, optimization, 
	and decompilation.  
III.  CMPLISP 
IV.  Canonical form and transformation of CMPLISP programs to it.  
V.  Rederivation - machine description, understanding system, problems with 
	equality, etc.  
VI.  Matching 
VII.  Examples 
	A.  REVERSE 
		1.  canonical form 
		2.  rederived form 
		3.  proof of equivalence 
	B.  NEXT 
		1.  canonical form 
		2.  rederived program 
		3.  proof of equivalence 
	C.  HIER1 
		1.  LISP encoding 
		2.  unoptimized LAP 
		3.  optimized LAP 
		4.  optimized LAP with errors 
			a.  point out errors 
			b.  show how system fails to prove equivalence to LISP by 
				showing error messages and annotate them - not too 
				much detail (just discuss the type of error) 
VIII.  Conclusions and Future Directions 
	A.  indicate strong points - i.e. what realizations enable us to derive 
		power 
	B.  future 
		1.  allow recursive calls to points past the start other than 
			self - i.e. closure mechanism in EL1 (Nov. 30,1973) 
		2.  would like a general mechanism of inputting relations between 
			fucntions in terms of genral variables.  This can only be 
			handled as is presently done by creating instances of the 
			identities whenever they are true and entering them in the 
			data base.  Problem is that data base becomes too big - OK 
			for limited identities.  More difficult for rules which 
			involve uses of several functions - i.e. associativity would 
			result in the creation of a large number of relations - i.e. 
			coupled with commutativity a*b*c would have 8 relations.  
		3.  better or more general equality algorithm  
		4.  use of analysis of EQ, EQUAL, and ATOM corresponds to a theorem 
			prover for LISP.  Would like to extend to bit manipulations 
			as well as arithmetic and ?.  Basically, this depends on the 
			domain of the programs.  For arithmetic we only have 
			commutativity.  Clearly, would also like associativity.  
			Need a mechanism for declaring equalities in schema form 
			and recording them in such a manner rather than present 
			recording of all instances.  For example, don't know about 
			associativity of APPEND.  
		5.  Description of additions to the computation model - i.e. suppose 
			a parameter stack is used, then system will still work.  The 
			addition is a data structure PSTACK (Interlisp uses a 
			parameter stack).  By addition we mean that program can 
			translate input constraints such as preservation of 
			registers or stack upon a function call into code.  
		6.  Environment other than LISP - see Fibonacci in SYNPUB with a 
			regular number representation rather than LISP.  
		7.  Can use system to mark instructions that never get executed.  
			With this knowledge, the space required can be reduced 
			and furthermore, the code can be improved as a result of 
			such information.  See Nievergelt (Feb. 1973) and redundant 
			code notes of Nov. 5,1974.  
		8.  DWP comments in notes of Oct. 28,1974.  

IX.  Appendix describing LAP 
	A.  LAP - include CALLF, JCALLF 
	B.  instructions 
		1.  verbal 
		2.  program 
		3.  primitives not previously explained 
	C.  Detectable errors in LAP (give a listing of error messages from 
		emiterror) 
X.  Manual to show that system really works 
XI.  Bibliography 

I.  Introduction 

A.  indicate that all numbers used in examples and elsewhere are in OCTAL (base 8) 
B.  Examples 
	1.  introduce NEXT - algorithm 1 (use SYNPUB discussion) 
		a.  compiler code 
		b.  handcode 
		c.  equivalence 
	2.  algorithm 2 for NEXT - describe features of various encodings 
		a.  compiler code 
		b.  handcode 
		c.  JRL 
		d.  Savitsky 
		e.  equivalence 
	3.  REVERSE - one argument 
		a.  compiler 
		b.  handcode 
	4.  REVERSE - two argument 
		a.  compiler 
		b.  handcode 
		c.  RPLACA 
			i.  incorrect (in line RPLACA without checking ATOM) 
		       ii.  correct 
		d.  incorrect - do I have one? (maybe wrong stack order) 
		e.  interchanging first with second argument 
			i.  incorrect (HRRZS@) 
		       ii.  correct 
	5.  discussion 
		a.  1 to 2, 3 to 4, and 4 to 4e represent user manipulations of 
			assembly code once the optimizer has done its work.  
		b.  mention stepwise refinement 
		c.  like Knuth's "Structured Programming with GO TO" and see my 
			notes 
		d.  these transformations show need for correctness 
	6.  optimizations 
		a.  inner loop faster 
		b.  order of argument evaluation rearranged 
		c.  make use of results of tests (algorithm 2 for NEXT) 
		d.  bypass start of program on recursive call 
		e.  calling sequence 
	7.  what the system can and cannot do 
		a.  cannot convert algorithm 1 to algorithm 2 for NEXT 
		b.  cannot convert non list modifying to list modifying for REVERSE 
		c.  One argument to two argument REVERSE cannot be currently done 
			although we could make use of schema like transformations as 
			shown in SYNPUB for REVERSE.  similarly, for FIBONACCI.  
		d.  cannot rearrange order of parameters for a function as 4e 
		e.  can detect errors 
	8.  maybe use other examples in SYNPUB - i.e. SUBB 
	9.  conclusion 
		a.  we provide a framework for proving optimizations such as these 
			correct 
		b.  current scenario envisions the user sitting at a terminal and 
			interactively optimizing his program with simultaneous 
			proofs 
		c.  good for speeding up inner loops as shown for NEXT and REVERSE 
		d.  in future build into an optimizer or incorporate with it 
		e.  we are able to handle all occasions where the basic algorithm 
			does not change 

IV.  Canonical Form (actually normal) 

A.  JMC definitions and MATCMP paper 
	1.  defines a canonical form 
	2.  becomes a normal form when introduce knowledge of equality (EQ) 
	3.  JMC proves existence of canonical form for gbf's 
	4.  we make extensions to incorporate LISP 
	5.  axioms become qualified by use of flow analysis information 
B.  Inclusion of side-effects 
	1.  need a representation in terms of order of computation 
	2.  numbering algorithm 
	3.  application of axioms need checks for redundancy 
	4.  flow analysis and limitations - i.e. assume worst can happen
C.  Extensions to handle constructs 
	1.  COND 
	2.  Multiple results (FN) 
	3.  Internal LAMBDA 
	4.  NULL 
	5.  AND, OR 
	6.  PROG 
D.  Distinction between value and effect 
	1.  RPLACA 
	2.  RPLACD 
	3.  SETQ to SPECIAL variables 
	4.  SET (?) 
E.  Implicit and explicit conditions (part of axiom ?) 
	1.  ATOM 
	2.  EQ 
	3.  EQUAL 
F.  Maintenance of order of computing test both explicit and implicit - axiom 
	application still maintains order 
G.  Conversion of breadth first to depth first - functions must have higher 
	computation numbers than their arguments.  
H.  Algorithm F for determining bindings of variables 
	1.  SETQ to local variables and use of variable bindings 
	2.  LAMBDA variables 
	3.  PROG variables 
I.  Duplicate Predicate Removal Algorithm 
	1.  based on properties of functions in CMPLISP section 
	2.  flow analysis used 
	3.  redundant storage on first instance - see definition in CMPLISP 
		this means on all paths 

VI.  Matching 

A.  why not lexicographic minimum (Feb. 22,1974) 
B.  loop unrolling and loop economy discussions 
C.  modifications to REDERIVE - PASS4 
	1.   Powerset of labels branched to 
		in backward direction will yield one rederived program for 
		each element of powerset 
	2.  Question:  When determining argument values should we use 
		matching as in the matching phase or stick to redundant 
		computations.  Think about it and try to justify choice.  
	    Answer:  Redundant computations is the right solution since 
		if a computation is repeated in the beginning of a function 
		as well as prior to the call, then its elimination  (or 
		bypassing) may be wrong.  For example, suppose a function 
		known to read and modify a special variable fits this 
		criterion.  Moreover, the arguments are identical.  In this 
		case, its bypassing would be a mistake.  The function could 
		be of a nature that increments the SPECIAL variable.  This 
		should only be mentioned in Chapter 6 once the distinction 
		between matching and redundancy is explained.  
D.  Unspecified parameters 
E.  Pinpointing errors 
	1.  computations out of order 
	2.  missing computations - a value was computed in the rederived form 
		which was not specified in the canonical form or out of order.  
		This could result from
	3.  indicate when a location was garbaged - storage history field in 
		data descriptor 
	4.  missing condition 
	5.  mismatching conclusion - the conclusion in the rederived form does 
		not match the conclusion as specified by the canonical form.  
	6.  missing FN list computation in rederived form (i.e. a computation 
		not used as a subexpression of a predicate or a result was not 
		computed by the assembly language program).  
	7.  unknown conclusion in the rederived form resulting from an error in 
		the assembly language program.  
	8.  anything else? 
	9.  illustrate by examples 
F.  Matching algorithm and various passes 
	1.  use induction for backward jumps 
	2.  termination 
	3.  Sometimes manipulate canonical form and sometimes rederived form.  
		a.  We take the basic approach that the rederived program represents 
			a program and we wish to manipulate the canonical form in 
			legal ways to obtain a match.  When this is done, we say 
			that the programs are equivalent.  This will account for 
			rearranging of conditions as well as the order of performing 
			computations (i.e. order of argument evaluation).  Thus by 
			finding equivalence preserving transformations we are 
			proving that the rederived form is equivalent to the 
			canonical form.  
		b.  We manipulate the rederived form to match the canonical form 
			when we try to match backward jumps.  The canonical form 
			consists of the already manipulated version.  This is 
			because we will be trying out various versions of 
			abbreviated rederived forms with induction.  
		c.  The above is necessary especially when conditions have been 
			rearranged.  Problem here is that if we use the original 
			canonical form, then conditions have not been rearranged.  
			However, when we expand recursive call mismatches we use 
			the rederived forms plugged into the mismatching function 
			calls in the canonical form since by induction, the 
			canonical form has already been reordered to correspond 
			to the order implied by the rederived form.  
G.  Proof procedure is machine independent and likewise for intermediate 
	representation.  
H.  Matching 
	1.  no backward jumps 
		a.  match rederived against canonical manipulating canonical 
		b.  numbering 
		c.  computations 
			i.  explicit 
		       ii.  implicit 
		d.  conditions 
			i.  EQ, ATOM 
		       ii.  EQUAL and check redundant since EQUAL is not primitive 
		e.  termninal node 
I.  Give NEXT as an example of proof 

A.  definition of matching 
B.  matching algorithm 
C.  loop unrolling versus loop economy 
D.  changes to rederivation process 
E.  modification for backward jumps 
F.  type of errors that can be detected 

A.  Introduction 

Once the original LISP program has been converted to the canonical form and the 
LAP program converted to the rederived form, we are ready to attempt to prove the 
equivalence of the two.  From previous discussions and examples it should be clear 
that there is no valid reason for assuming that the outputs of the two procedures 
are identical (symbolically and to a lesser degree numerically).  

One possible solution is to find some representation of the two forms which is in 
terms of a lexicographic minimum.  This would remove the problem that is associated 
with the use of substitution of equals for equals.  Such a solution would 
imply that we could find a base representation for the rederived form and likewise 
for the canonical form and then simply perform a strict equality match.  However, 
such a solution fails to take into account the possibility of condition rearranging 
which does not preserve the lexicographic ordering property.  For example, consider 
***Feb. 22, 1974 discussion*** 

Moreover, the lexicographic minimum does not solve problems associated with 
rearranging the order of computation of various functions as well as deal with 
the concept of recursion via jumps to points past the start of the program.  

In the process of proving equivalence we will have to show that neither of the 
canonical and rederived forms reflect computations not performed in the other.  We 
have already addressed ourselves in part to this problem in the previous chapters 
when we discussed the removal of duplicate predicates and computations from the 
canonical and rederived forms.  Recall that at the end of the rederivation 
and canonical form procedures, we removed all redundant first instances of 
assignment.  This was needed to insure that assignments into elements of the List 
Structure were not done for the purpose of temporary storage of results - i.e. the 
location could not be subsequently referenced with the specific value as its 
contents.  Occurrences of functions in one form and not in the other will lead 
to inequivalence unless the functions are of a primitive nature since they can be 
introduced at will.  

For example, consider the case of a computation of CAR(<expr>) which 
can not be matched even though CDR(<expr>) was matched or vice versa.  In 
this case we currently say that inequivalence is the case since a 
computation is performed in one form and not in the other.  Actually, 
we should say that if CAR (or CDR) is legal then so is CDR (or CAR).  
In other words the operation becomes primitive in such a case.  This is 
because no side effect can occur - i.e. recall that the problem was that CAR 
and CDR are not defined when their argument is non-atomic.  But the act of 
computing one of one them safely, implies that the other can also be 
computed safely.  This problem is quite common in a LISP implementation 
where a word represents more than one LISP entity - i.e. a pointer to CAR 
in the left halfword and a pointer to CDR in the right halfword.  In such a 
case we may access an entire word when in fact we only desire a specific half.  
A similar consideration is if ATOM(A) is known not to be true, then CAR(A) and 
CDR(A) are also safe provided that they are performed after the ATOM test.  

One solution is to place CDR(A) on the FN list (and on UNREFERENCED) whenever 
CAR(A) is seen and similarly for CAR(A) and CDR(A).  Also place CAR(A) and 
CDR(A) on the FN list  (and on UNREFERENCED) whenever ATOM(A) is known to be 
NIL (i.e. false).  The only problem is that when we remove redundant first 
instances of computation (see canonical form and also prostprocessing stage 
of rederived form) involving modification of heads or tails of elements of the List 
Structure, and if an 
access to the head or tail appears as an argument in a non-result slot to an 
FN construct, then it is not considered as an access operation.  

We propose a solution for a CAR(A) operation that cannot be matched.  This 
procedure is applied prior to removing redundant first instances of a 
computation involving modification of heads or tails of elements of the List 
Structure at the end of the canonical form process described in 
Chapter 3 and during the postprocessing stage of the rederivation process.  
It should be noted that the same solution would hold for CDR(A) 
(actually need to replace CAR by CDR and CDR by CAR in the discussion).  
Examine the non-result 
arguments of FN constructs for instances of CAR(A).  If all such occurrences 
appear as non-result arguments of FN constructs, and if CDR(A) appears in 
a predicate or in the result of a path containing CAR(A), then CAR(A) can 
safely be removed from the FN list.  By safely removed we mean that its 
computation qualifies as a primitive operation.  We leave the case of ATOM 
for future work although it is clear that it would be handled in much the 
same manner.  

Our solution to the equivalence problem is to tranform one form into the other.  
We have two choices of operation.  We can show via the use 
of equivalence preserving transformations (the axioms and substitution) that the 
canonical form can be transformed into the rederived form or vice versa.  However, 
this is not enough; we must also demonstrate that neither form reflects any 
computations not performed in the other.  Thus we will perform the matching process 
twice; once manipulating the canonical form to match the rederived form, and next 
manipulating the rederived form to match the result of the previous manipulation of 
the canonical form.  

The remainder of the chapter expands on the previous notions.  We first present 
the matching algorithm in terms of transforming the canonical form to match the 
rederived form.  We next discuss the issues relevant to proving equivalence when 
recursion has been implemented via jumps to points other thatn the start of the 
program.  This is coupled with modifications to the rederivation process as well 
as the matching procedure just presented.  Finally, the types of errors that can 
be discovered are briefly discussed.  

***has matching been defined yet?*** 

The matching procedure consists of manipulating the canonical form to obtain an 
identical form to the rederived form.  This is done by examining the rederived 
form in order of increasing computation numbers and finding what will be known as 
"matching" computations.  We use a variant of the canonical form algorithm for 
strong equivalence described in Chapter 4.  The difference is that in addition to 
predicates being rearranged via axiom (8), the two forms may differ because 
computations may be computed out of sequence as well as the use of substitution 
of equals for equals.  Thus we see that the concept of inevitable predicate must be 
extended to include all computations.  Basically, for each computation performed in 
the rederived form, we must ensure that it is also computed in the canonical form.  
This means that if a computation appears in a condition (non-terminal node) in the 
rederived form, then it must either appear in the corresponding condition in the 
canonical form, or in each of the subtrees of the unmatched canonical form.  Of 
course the same criterion holds for computations in terminal nodes.  

Recall that the numerical representation of a function serves to indicate a relative 
ordering between the various computations in terms of their instance of computation.  
Thus the magnitudes of the numbers are seen to be insignificant.  We will make use 
of the following numbering scheme which will be seen to have certain 
desirable properties.  In order to perform the matching process we start with all 
computations in the canonical form having higher numbers than those in the rederived 
form.  The only exception are atoms (including local parameters but excluding 
SPECIAL variables) which are assigned identical computation numbers in the two 
representations.  

Each computation in the rederived form must be proved to be inevitable in the 
canonical form.  This means that the computation or its equivalent must be shown to 
be computed in 
the canonical form.  Moreover, its instance of computation in the canonical form 
must be shown to yield an equivalent result to the value computed in the rederived 
form.  This proccess is termed finding a matching instance.  The criteria for 
matching are given below.  

1.  If the function reads and modifies a SPECIAL variable, then there must be no 
intervening reading or modification of the said variable.  

2.  If the function only reads a specific SPECIAL variable, then the said variable 
must have the same values prior to the two instances of the function.  

3.  If the function only modifies a specific SPECIAL variable, then there must be no 
intervening reading or modification of the said variable.  

4.  If the function reads and modifies heads (or tails) of elements of the List 
Structure, then there must be no intervening reading and modification of the said 
part.  

5.  If the function only reads heads (or tails) of elements of the List Structure, 
then there must be no intervening modification of the said part.  

6.  If the function only modifies heads (or tails) of elements of the List 
Structure, then there must be no intervening reading or modification of the said 
part.  

7.  The function may perform a CONS operation.  

The criterion for matching are quite similar to those used to determine duplicate 
computatiton.  However, there are several differences.  Operations resulting in 
the modification of SPECIAL variables and the List Structure may also read the 
said items.  Also an LCONS operation is permissible.  Without the previous 
differences constructs employing such functions could not be matched up.  

The key to the matching procedure is that whenever a computation, say A, is matched 
with a computation not previously encountered, say B, then all occurrences of B in 
the canonical form are replaced by A.  Similarly for the SPECIAL variables modified 
by A - i.e. the computation number associated with SPECIAL variables modified by A 
replaces the computation number associated with SPECIAL variables modified by B.  
This means that during the matching process 
all computations in the canonical form with computation numbers higher than those 
in the rederived form have not yet been matched.  Moreover, at the end of the 
procedure we will have managed to transform the canonical form into the rederived 
form if equivalence holds.  

The process of finding a matching computation is quite straightforward.  We process 
the rederived form in increasing order of computation number.  For each function, 
say A, search the canonical form for a matching instance.  Recall that all 
computations in the canonical form with a computation number less than the number 
associated with A have already been matched.  The matching instance 
can be encountered in two ways.  An explicit occurrence is defined to be a function 
which has not been previously encountered in the canonical form.  An implicit 
occurrence is a function which has already been previously encountered when matching 
other computations.  This results from use of equality information and subsequent 
substitution of equals for equals.  For example, suppose that we are looking for 
(CDR B).  (CDR A) has already been computed and matched.  In the true subtree of 
the condition (EQ A B) we see that (CDR A) is EQ to (CDR B) and thus (CDR A) is a 
matching instance of an implicit nature.  Thus all that remains is to search the 
false subtree for an explicit occurrence of (CDR B) or another implicit equivalent.  

As soon as all of the arguments of a predicate, say p, are matched up we check if it 
is a primitive function.  If yes, then apply the following transformations to the 
canonical form, say CAN, corresponding to the part of the rederived form currently 
being processed.  

1. Replace CAN by (p→CAN,CAN).  This corresponds to application of axiom (1).  

2. Apply the breadth first to depth first renumbering algorithm to the result of 1.  
This is necessary because the conclusion and alternative clauses have the same 
numeric representation after application of 1.  Moreover, subsequent applications 
of matching assume that the canonical form has the properties that all functions 
with the same computation number have been computed simultaneously, and that all 
computations computed in the left subtree have a higher computation number 
associated with them than those computed in the right subtree.  

3.  Apply the duplicate computation removal algorithm to the result of 2.  This 
corresponds to the application of axioms (5) and (6) as indicated in the canonical 
form algorithm for strong equivalence given in Chapter 4.  

We did not need to match primitive predicates because by definition primitive 
functions can be considered as inevitable computations.  However, if the predicate 
does not correspond to a primitive function, then we must insure that it is indeed 
inevitable.  We choose a different method for proving inevitability.  We attempt to 
prove that at each terminal node of the canonical form, the predicate is redundant.  
If the latter is true, then clearly the predicate is inevitable; otherwise it is 
not.  The reason for use of such a procedure is best exemplified by the predicate 
EQUAL.  In this case, if we were to look for an occurrence of the predicate in the 
same manner as was done for functions, then the situation may arise that we can not 
match it.  For example, consider (EQUAL A B) in the rederived form, and a canonical 
form consisting of the tree in Figure ? with the left and right side representing 
the true and false cases respectively of a predicate.  

		(ATOM A) 
		/      \ 
	       /	\ 
	(ATOM B)	(EQUAL A B) 
	 /    \     	  /	\ 
	/      \	 /	 \ 
   (EQ A B)  COMP_3   COMP_4   COMP_5 
    /	 \ 
   /	  \ 
COMP_1  COMP_2 

In this example we see that the predicate (EQUAL A B) is redundant by the time 
COMP_1, COMP_2, COMP_3, COMP_4, and COMP_5 are performed.  However, if we were to 
use our matching algorithm to detect matching instances of the predicate, then we 
would lose.  Note that we have made use of the implications of the various 
predicates as outlined in Chapter 3.  Actually, our statement about a loss is not 
quite correct since a value of the predicate of T or F can be considered as an 
implicit occurrence.  Nevertheless, we shall stick with our treatment.  Finally, 
we point out that this example illustrates how problems pointed out in the 
discussion of the lexicographic minimum approach to matching do not surface when 
equivalence is proved using our matching techniques (recall problem in 
distingsuishing between occurrences of EQ and EQUAL as well as making use of 
such relationships).  

Once a predicate has been matched we proceed to apply the matching procedure to 
the conclusion and alternative clauses of the rederived form.  Once the entire 
rederived form has been processed by the canonical form algorithm, we see that 
all computations performed in the rederived form have been matched (i.e. shown 
to be equivalent) with computations in the canonical form.  However, we have not 
yet proved that all computations performed in the canonical form have also been 
performed in the rederived form.  This can be done by applying the matching 
procedure with the roles of the canonical form and rederived form reversed - i.e. 
manipulate the rederived form to match the canonical form.  This problem will 
disappear once we present, in a subsequent section, the method for handling matching 
of functions with recursive calls implemented as jumps to points other than the 
start of the program.  

From the discussion we see that the proof procedure is machine independent.  Also 
notice how the various properties of the depth first numbering scheme are used to 
enable the matching procedure to correspond quite closely to the duplicate predicate 
removal process.  In fact many of the same routines are used in the implementation.  

D.  modification for backward jumps 

***indicate earlier that we may manipulate the canonical form to match the 
rederived form or vice versa*** 

1.  Perform as much matching as possible by manipulating canonical form to match 
the rederived form.  By as much as possible we mean that if a mismatch occurs on 
one path, then process the alternate path in the same manner.  The result of the 
procedure is a transformed canonical form, say CANRESULT.  

2.  If no mismatches occurred then we have successfully 
managed to prove that the canonical form can be transformed in a legal manner to 
match the rederived form.  

3.  At least one mismatch has occurred.  There are several possible causes.  

a.  Computations were encountered that could not be matched.  This could have been 
caused by the expansion of a recursive call as in the case of loop unrolling or 
by an error.  The exact reason is resolved by the remainder of the algorithm.  

b.  An attempt was made to match a result computation (i.e. a terminal node) in the 
rederived form against a condition (i.e. a non terminal node) in the canonical form.  
If the condition is a predicate of a primitive nature having identical conclusion 
and alternative clauses, then the remainder of the algorithm will apply axiom (1).  

Renumber both the original rederived form 
and the canonical form resulting from step 1 - i.e. CANRESULT.  However, this time 
the rederived form receives the higher numbers.  This renumbering is the algorithm 
mentioned in the previous section.  At this point we attempt to manipulate the 
rederived form to match the canonical form.  Note that this is the reverse of the 
procedure applied in step 1 because we wish to detect the exact instance of the 
mismatches so that we can determine if they are caused by recursive calls in the 
canonical form which have been expanded in the rederived form via loop unrolling.  
If all mismatches are of this nature, then proceed to step 4.  Otherwise, the 
algorithm terminates with a result of inequivalence or an inability to prove 
equivalence.  

All mismatches that 
were caused by case b. with a primitive predicate and identical conclusion and 
alternative clauses no longer exist because the matching algorithm applies 
axiom (1) followed by axioms (5) and (6).  For example, suppose we try to match 
(CDR A) with ((EQ (CDR A) F)→F,(CDR A)).  In this case, the equality does hold.  
If the former is in the rederived form and the latter is in the canonical form, 
then when attempting to manipulate the canonical form to match the rederived 
form, the matching procedure will fail and declare a mismatch because of case b.  
The next attempt at matching will manipulate the rederived form to match the 
canonical form.  We take advantage of the fact that EQ is a primitive 
predicate and that the computations (CDR A) and F are inevitable to replace 
(CDR A) in the rederived form by (EQ (CDR A) F)→(CDR A),(CDR A)).  The equivalence 
should be obvious.  Note that no special handling was required; this is simply 
part of the matching algorithm since it involves the application of axiom (1).  
If in fact the condition was in the rederived form and the result in the 
canonical form, then step 1 would not have detected any mismatch since axiom (1) 
would have been applied as indicated here.  

Step 3 is necessary since in the case of mismatches due to loop unrolling, step 1 
will detect a mismatch when a computation is performed in the rederived form and 
not in the canonical form or vice versa.  Most often, the mismatch is of the 
former type and is usually in the condition although it may also be in the result 
clause.  We give an example of a mismatch occurring because a computation 
performed in the canonical form was not performed in the rederived form.  
Consider the optimal encoding for the function NEXT.  In this case we will 
detect a mismatch when attempting to match ((EQ (CDR L) F)→F,(NEXT (CDR L) X)) 
in the rederived form with (NEXT (CDR L) X) in the canonical form.  The first 
step is to apply axiom 1 to (NEXT (CDR L) X) in the canonical form since 
(CDR L) and F are inevitable computations and EQ is a primitive predicate.  
The resullting form is ((EQ (CDR L) F)→(NEXT (CDR L) X),(NEXT (CDR L) X)).  
We now must match the conclusion and alternative clauses.  The alternative 
clause is seen to match; however, the conclusion clause does not match - i.e. F 
is not matched by (NEXT (CDR L) X).  Application of step 3 will result in an 
inability to match (NEXT (CDR L) X) with the condition in the rederived form.  
Thus we see that step 3 will indicate if the cause of the mismatch was loop 
unrolling of a recursive call.  

***indicate that examples neglected to mention numeric representation*** 

Still left:  

1.  Subset of CMPLISP handled does not include PROGs (can include PROGs with a 
	restricted non-backward GO).  
2.  modify EQUAL.PUB to include ATOM (significant update) 
3.  diagrams of environment in CHAP3.PUB 
4.  DERIV6 and CANON6 changes:  
	a.  in CANON6 change all use of FUNCTION to QUOTE 
	b.  in CANON6 and DERIV6 change all QUOTE to QUOTEATM if argument is an 
		atom and to QUOTELST if argument is a list.  
		This is done upon input (GETINPUT in DERIV6 and SUBSTITUTE in 
		CANON6).  Also change OUTPUTLAP in DERIV6 to convert QUOTEATM 
		and QUOTELST back to QUOTE for outputting the lap code.  
	c.  whenever test for QUOTE we should test for MEMBER in QUOTED where 
		QUOTED←<'QUOTEATM,'QUOTELST> 
	d.  QUOTELST has the 'LCONS property 
	e.  make sure parse routines know about QUOTED 
	f.  zero computation numbers are still assigned to the components of 
		the argument to QUOTE 
	g.  PRUNEFN and CHECKREDUNDANT no longer mess around with QUOTE 
	h.  check all other uses of QUOTE 
5.  Implement SET in DERIV6 and CANON6 (major piece of work).  
6.  Calls to FEXPRs are not yet implemented in DERIV6 and CANON6 and will not be 
	included in the thesis - left for future work (only QUOTE is implemented).  
7.  Check last part of CHAP3.PUB to make sure mention of all bits are 1 for atom is 
	not out of place.  Basically, want to eliminate reference to NIL being 0 
	internally except when define new property list format.  
8.  CMPLISP means that REMPROP, PUTPROP, REMOB, . . . do not have list modification 
	effects.  Therefore must change for flow analysis.  
9.  No need for termination test in flow analysis.  
10.  In CANON6 STEP3ASSIGN:  (SETQ A (SETQ B C)) is not done right - must check if 
	second argument is SETQ and if yes, then check again . . . until get 
	binding.  This is only necessary in case inner SETQs are to SPECIALs since 
	otherwise SETQ disappears.  
11.  Algorithm for binding variables in CANON6 must be changed to reflect new 
	correct method.  Basically, need a third representation, the binding 
	representation.  Also, no longer need SETL, DOL for LAMBDA handling.  
12.  Flow analysis must not consider (RPLACA <special variable> <binding>) as 
	an access operation on the SPECIAL <special_variable>.  
13.  In CHAP4.PUB should probably give definition of a tree (root, left subtree, 
	right subtree).  Mention that a LISP program is like a lattice in numbering 
	scheme.  Canonical form is not unique if consider substitution of EQ items 
	for other EQ items (mention that uniqueness can occur via use of 
	lexicographic minimum but not worthwhile since matching phase will change 
	to fit - should go in second paragraph of Adaptation to LISP).  
14.  Emit a GENSYM for 'FLAGS - i.e. CSYM(FLAGS000) initially.  Must make sure that 
	equality check works OK for 'FLAGS.  I believe yes.  Need to fix SUBHALF1 in 
	case subtract two identical FLAGS.  Also in description of CALL, PUSHJ we 
	need to use GETFLAGs() instead of FORMDATAPOINTER('FLAGS).  GETFLAGS 
	creates a datapointer with a GENSYMed FLAGS value and saves FLAGS counter.  
15.  No CONS(CAR(A),CDR(A)) EQ A because not true since no hashing - this should be 
	indicated in CHAP3.PUB.  
16.  Use NEXTINSTRUCTION() instead of NIL in routine descriptions in Appendix.  
17.  In CHAP5.PUB when discussing EQSUB in restrictions section note that the 
	appearance of the construct as a non-result argument to an FN construct 
	means tha arguments are placed on FN list.  I believe that this is currently 
	done.  DERIV6 should also be modified to place the operands to operations 
	which yield results of type UNKNOWN on UNREFG if they have not been 
	previously referenced.  This should probably be mentioned in CHAP5.PUB .  
	This must be done so that we don't lose track of some LISP computations.  
18.  CHAP3.PUB has an error in section on when two items are EQ.  Specifically, if 
	the function does modification of heads (tails) then it cannot access heads 
	(tails) and there must be no intervening modification of heads (tails).  
	We leave the error uncorrected to see if it will be caught.  
19.  CHAP5.PUB mention in measurement discussion comments about effect of stack size 
	on garbage collection and othe comments in Knuth notes (Sept. 24,1974).  
20.  Super LISP by John Allen notes (Sept. 26,1974) has a blurb about COND not being 
	a function and also about a characterization of functions in terms of total 
	and partial.